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Barn owls hunt in the dark by using cues from both sight and sound to locate their prey. This 
task is facilitated by topographic maps of the external space formed by neurons (e.g., in the 
optic tectum) that respond to visual or aural signals from a specific direction. Plasticity of these 
maps has been studied in owls forced to wear prismatic spectacles that shift their visual field. 
Adaptive behavior in young owls is accompanied by a compensating shift in the response of 
(mapped) neurons to auditory signals. We model the receptive fields of such neurons by linear 
filters that sample correlated audio-visual signals, and search for filters that maximize the gathered 
information, while subject to the costs of rewiring neurons. Assuming a higher fidelity of visual 
information, we find that the corresponding receptive fields are robust and unchanged by artificial 
shifts. The shape of the aural receptive field, however, is controlled by correlations between sight 
and sound. In response to prismatic glasses, the aural receptive fields shift in the compensating 
direction, although their shape is modified due to the costs of rewiring. 



I. INTRODUCTION 

In the struggle of biological organisms to survive and reproduce, processing of information is of central importance. 
Sensory signals provide valuable information about the external world, such as the locations of predators and preys. 
Localization of sources is facilitated by topographic maps of neurons to various parts of the brainQ, reflecting the 
spatial arrangements of signals around the animal. The barn owl has to rely extensively on sounds to find its prey in 
the dark, and has consequently developed precise 'auditory space maps.' 

By extensive experiments, Knudsen and collaborators have shown that the optic tectum of the barn owl has both 
visual and aural maps of space that are in close registry ||. The visual signal plays a crucial role in aligning the 
aural map; experimental manipulations of the owl's sensory experience reveal the plasticity of these maps in young 
animals, and the instructive role played by the visual experience. (A recent review, with specific references can be 
found in Ref. Q.) The current study was motivated by experiments in which owls are fitted with prismatic spectacles 
that shift the visual fields by a preset degree in the horizontal direction^. In young owls, the receptive auditory 
maps were found to shift to remain in registry with the visual maps, which stayed unchanged. 

There is at least one theoretical attempt to explain the registry of neural maps through a 'value-dependent learning,' 
where synaptic connections in a network are enhanced after 'foveation towards an auditory stimulus'[^. In this paper 
we take a more abstract approach to the coupling of audio-visual maps, and search for neural connections (receptive 
fields) that maximize the information gained from the sensory signals. In earlier studies [Q, |^, Bialek and one of us 
formulated an approach to optimization of information in the visual system, and in computations with neural spike 
trains |§. 

Here, we extend the methods of Ref. [Q for computing receptive fields in the visual system, to finding the optimal 
connectivities in an audio- visual cortex, such as the owl's optic tectum. We find that the shape and registry of the 
aural map is established by the correlations between the audio and visual signals. In response to an artificial shift of 
the visual field (as with the prismatic spectacles), the visual receptive field is unchanged. While the aural receptive 
field shifts in the adaptive direction, its shape changes due to the costs of rewiring the neurons. 

The general formalism for our calculations is set up in Sec. II. A, which reviews the methodology introduced in 
Ref. [Q . The essence of this approach is the assumption that neural connections act as linear filters of the incoming 
signals, and also introduce noise in the outputs. If the (correlated) input signals, and the random noise, are taken from 
Gaussian probability distributions, the outputs are also Gaussian distributed. The Shannon[0 information content 
of the resulting outputs is easily calculated. The task is to find filter functions that maximize this information, subject 
to biologically motivated costs, and for given correlations of the input signals. In Ref. this approach was used to 



obtain receptive fields in the visual system. In Sec. II. B, we generalize this formalism to coupled audio- visual signals. 



A necessary input to the calculations is the correlations between the audio and visual signals, as discussed in 



Sec. [I.e. Since it is clearly much easier to localize objects by sight that sound, it is reasonable that the information 
carried by the visual channel should far exceed the aural one. The two sources of information are however quite 
likely to be correlated, resulting in couplings between the corresponding filters. In the experiments on barn owls, the 
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prismatic glasses shift the visual field and hence modify the correlations between the signals. We examine how such 
shifts change the filter functions (neural connectivities) that optimize the information content in the outputs. 

As argued in Sec. II.D, the disparities in the strengths of visual and aural signals simplify the search for optimal 
filters. In particular, wc find that the visual receptive fields are relatively robust and unchangeable, while the shape 
of the aural receptive field is the product of two terms: One refiects the correlations between sights and sounds, and 
shifts along with external displacement of these signals; the second is associated with the costs of making connections 
to distant neurons. This result is further interpreted in the final section (Sec. Ill), where some implications for 
experiments, as well as directions for future extensions and generalizations, are also discussed. 



II. ANALYSIS OF INFORMATION 

A. General Formalism 

The processing of information by neural connections in the cortex is modelled in Ref. as follows: After passing 
through intermediate stations, sensory signals arrive as a set of inputs {sj}. Further processing takes place by neurons 
that sample the information from a subset of these inputs, and produce an appropriate output. For ease of calculation, 
the outputs are represented as a linear transformation of the inputs, according to 

0^ ^^F^JSJ +111 . (1) 

J 

The filtering of information is thus parameterized by the matrix {Fij}, and is also assumed to introduce an unavoidable 
noise rji. There are of course many possible sensory inputs, which can be taken from a joint probability distribution 
Pin[sj]. Equation (|l|) is thus a transformation from one set of random variables (the inputs) to another (the outputs); 
the latter described by the joint probability distribution function Pout[Oi]. The amount of information associated 
with a given probability distribution is quantified pO[| (up to a baseline and units) by I[P] = — (In P), where the 
averages are taken with the corresponding probability. The task of finding optimal filters is thus to come up with the 
matrix F that maximizes I[Pout] for specified input and noise probabilities. 

The Shannon information can be calculated easily for Gaussian distributed random variables. Let us consider the 
set of N random variables {xi}, taken from the probability 



(2) 



where summation over the repeated indices is implicit, and det A indicates the determinant of the N x N matrix with 
elements Aij. It is easy to check that, up to an unimportant additive constant of iV/2, 

I[P] = Indct A = i Indet [{x,Xj)] , (3) 

where we have noted that the pairwise averages are related to the inverse matrix by {xiXj) = A^j^ . A linear filter 
as in Eq. (|^), maps one set of Gaussian variables to new ones. Thus if we assume that the inputs {sj}, and the 
(independent) noise {rji}, are Gaussian distributed, we can calculate the information content of the output using 
Eq. (D, with 

(0,;0,) = FuFkL {sjsl) + ■ (4) 

We are interested in describing cortical maps related to visual or aural localization of objects. These locations 
vary continuously in space, and are topographically mapped to positions on a two-dimensional cortex. As such, it is 
convenient to promote the indices i and J, used above to label output and input neurons, to continuous vectors in 
two dimensional space. For example, following Ref. |Q, let us consider an image described by a scalar field s(x) on a 
2— dimensional surface with coordinates x. The image is sampled by an array of cells such that the output of the cell 
located at x is given by 

0{x)= [ d?yF{x-y)s{y)+r){x), (5) 



where the function F{r) describes the receptive field of the cell. Assuming uncorrelated neural noise, {rj{x)rj{x')) = 
N5'^(x — x'), and signal correlations {s{x)s{x')) = S{x — x'), the filter-dependent part of the output information is 
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jiven by 
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6^{x-x')+ / d^y / Sy'F{x-y)F{x' -y') 



S{y 



r; 



N 



(6) 



Note that we have assumed that the signal is translationally invariant^ such that correlations only depend on 
the relative distance between their sources. This allows us to change basis to the Fourier components, s{k ) = 
J cPx exp{~ik ■ x)s{x), which are uncorrelated for different wave-vectors k. The overall information is then obtained 
from a sum of independent contributions, and using —> A J cPk/ {2i:)'^ where A is the cortical area, equal to 



(Pk 



In 



F{k) S{k) 



(7) 



where F{k) and >5'(fc) are Fourier transforms of the receptive field F(x), and signal to noise correlations S{x)/N, 
respectively. 

The task is to find the function F{k) which maximizes the information X. Clearly, we need to impose certain 
costs on this function, since otherwise the information gain can become enormous for _F — > oo. This cost ultimately 
originates from the difficulties of creating and maintaining neural connections that gather and transmit information 
over some distance, and is hard to quantify. Following Ref. we shall assume that the overall cost (in appropriate 
'information' units) has the form 



C = / d^xC{x)F{xy 



Jd^x'-±^Fixf^^J 
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This expression can be regarded as an expansion in powers of F and x, with the assumption that the cost is invariant 
under changing the sign of F, and independent of the direction of the vector x. It imposes a penalty for creating 
connections which increases quadratically with the length of the connection. Our central conclusion is in fact insensitive 
to the form of C{x). 

If the costs are prohibitive, there will be no filtering of signals. To avoid such cases, we compare only filters that are 
constrained such that J d^xF (x)^ — 1 (or any other constant). In the optimization process, this constraint can be 
implemented via a Lagrange multiplier, resulting in an effective cost similiar to the term proportional to A in Eq. (S). 
Thus, this term and the constraint can be used interchangably. In Ref. |0] it was shown that optimizing Eq. (|^) 
subject to the cost of Eq. (||), in the limit of low signal to noise, is equivalent to solving a Schrodinger equation with 
F(k) playing the role of the wave function in a potential iS(fc), and the Lagrange multiplier taking the value of the 
ground state energy. A potential of the form S{k) cx k^'^ was there used to obtain receptive fields with on center/off 
surround character. In the next section we generalize this approach by considering correlated visual and aural inputs. 



B. Coupled Audio-Visual Inputs 



In our idealized model, a neuron in the optic tectum of the owl filters input signals coming from both the visual 
and auditory systems, and its output is given by the generalization of Eq. (0) to 

0{x) = J d^yF^{x-y)s^iy)+r]{x), (9) 

where a is summed over A and V for audio and visual signals, respectively. Assuming as before that the signals Sa 
and the noise 77 are independent, correlations of the output are obtained as 



(O (fi)0(f2)) = J d^yi J d^y2Fo, (fi - yi) Fp {x2 - m) Sc^p {yi - j/2) + NS^ (fi - £2) ■ 

For translationally invariant signals, the output information is given by the generalization of Eq. (^ to 

d^k 
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l + Fc,ik)So,p{k)Fp{-k) 



(10) 



(11) 



where Sap{k) is a 2 x 2 matrix of (Fourier transformed) signal to noise correlations. 

Once more, we have to impose some constraints in order to make the maximization of the information in Eq. ( [Tl| ) 
with respect to the functions Fy and .Fa biologically meaningful. In principle, there could be different costs for 
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connections processing aural and visual signals. In the absence of concrete data, we make the simple choice of using 
the same form as Eq. for both sets of filters, so that the overall cost is 



A r (fk 



i2ny 



(12) 



The first term in the above cost function can again be interpreted as a Lagrange multiplier A imposing a normalization 
constraint 



(fxFaixf = A 



<fk 



F^{k)F^{-k) = l. 



(13) 



C. Signal Correlations 



To proceed further, we need the matrix of signal to noise correlations, which has the form 



S{k] 



Svik) 
n{k)e''''-' 



n{k)e'''-^ 
SA{k) 



(14) 



The diagonal terms represent the self correlations of each signal. Since many sources generate both sight and sound, 
the audio and visual signals will be correlated. These correlations are captured by the off-diagonal term TZ{k). In 
the experiments on owls[^, the visual signal is artificially displaced by a fixed angle in the horizontal direction. If we 
indicate this angle by the vector c, an aural signal at location x becomes correlated with a visual signal at (x + c) . 
After Fourier transformation, this shift appears as the exponential factor exp(ifc • c ) in the off-diagonal terms of the 
correlation matrix. 

So far, we have treated sight and sound on the same footing. It is reasonable to assume that under most (well lit) 
conditions the quality of visual information is much higher than the aural one. For ease of computation, we shall 
further assume that the actual signal to noise ratio is quite small, resulting in the set of inequalities 



5A(fc) <7^(fc) <5v(fc) < 1. 



(15) 



In this limit of small signal to noise, the logarithm in Eq. (|l^) can be approximated by its argument (without the 
one), resulting in a quadratic form in the filter functions. Our task then comes down to maximizing the function 
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5v(fc) Fv(fc) +SA{k) F^ik) 
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VfcFA(fc)'' 



(16) 



with respect to Fy and Fa. 



D. Results 



The optimal filters are obtained from functional derivatives of Eq. (|T^). Setting the variations with respect to 



Fy(fc ) to zero gives 



while SW/6FI = 0, leads to 



Sv{k) + Fv(fc) = XFv{k) - {7e(fc)FA(fc )e*^-=^} 



FA{k) = TZ{k)Fv{k)e-'^-'+ {5A(fc)FA(fc)} 



(17) 



(18) 



In arranging the above equations, we have placed within curly brackets terms that are much smaller according to the 
hierarchy of inequalities in Eq. (|l^). Note that in the absence of any correlations between the two signals {TZ = 0), 



aural receptive field 




FIG. 1 An aural receptive with two peaks, obtained from Eq. (E2) for I — 0.5, L = 0.1, and c = 1.1. 



Fa = 0, since the aural signal is assumed to be much weaker than the visual one. Any non-zero Fa reduces Fy due to 
the normalization condition, resulting in a smaller value of W. It is indeed the correlations between the two signals 
that lead to a finite value of Fa, of the order of {TZ/Sy). (Since A ^ 0(5v) as shown below.) 

To leading order, Eq. ( |l7| ) is the Schrodinger equation obtained in Ref. for the visual receptive field. Without 
further discussion, we shall indicate its solution by 



Fv(f ) = F^{x), and X = Ey = O {S^ 



(19) 



Note that we don't imply that cells in the optic tectum should have a receptive field for visual signals identical to 
that in the visual cortex. The quality of signals, the costs of neural connections, and the response of the cells may 
well vary from one cortical area to another. The eigenvalue Ey is controlled by the strength of the visual correlations 
and is of the order of Sv . 

To simplify the solution to Eq. ([l^), we first assume that TZ{k) = R, a constant independent of k. This is quite 
a reasonable assumption, corresponding to visual and aural signals that are correlated only if coming from the same 



direction, i.e. 
obtain 



with {s\/{xi)sa{x2)) = R5 {xi — X2). We can then Fourier transform the two sides of this equation to 



{Ev + fix^) Fa{x) = RF°{x ~ c). 



and quite generally, for an arbitrary form of the cost function in Eq. (g|) , the solution is 

R 



FAix) = 



Ey + CAix) 



Fi}ix~c). 



(20) 



(21) 



Due to the quadratic form of Eq. ([I6D, the above result is the linear response of the system to the correlations between 
signals. 

The significance of our result is that the aural receptive field Fj^ {x ) is not simply the visual receptive field shifted 
by c, as one might have guessed. Rather, the shape of Fa{x) could be significantly distorted by the cost function 
Ca(x). At the moment, the data may be too crude to determine the shape of Fa{x), but it is still worthwhile 
to contemplate what sort of shape distortion may result in our simple model. For illustrative purposes, let us 
take Fv{x) oc exp [— (x/^)^] to be a Gaussian with / a length scale characteristic of the visual receptive field and 
Ca{x) = ^x^. Then we predict (with c = (c, 0) and x = {x,y)) 



Fa{x,v) oc 



1 + (a:2 + y'^) / L'- 



■ exp 



{x — c)^ + 



(22) 



where L = Ey j [i defines a length scale characteristic of the relative cost of connecting distant neurons. While there 
are three length scales L, /, and c inovlved, the shape of Fa(x, y) depends only on the two ratios L/l and c/l. 

We now qualitatively describe the change in the shape of the aural receptive filed in Eq. ( p^ , as the imposed shift 
c is varied as in the experiments of Knudsen et al. (The exact analysis of the extremal points of Eq. (|22| ) involves 
the solution of a cubic equation which will not be given here.) Two types of behavior are possible depending on the 
ratio l/L. For I <^ L, where the cost of rewiring is negligible, the function FA{x,y) has a single maximum located at 
x K, c (and y = 0), i.e. simply following the imposed shift. When I ^ L, however, there is an intermediate range of 
values of c, where the aural receptive field has two peaks, one close to the origin, x„ w cL/Z <^ c, and another close 
to a;+ « c. A typical profile with two peaks is depicted in Fig. |^. 
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III. DISCUSSIONS 

Equation is the central result of our study. It provides the optimal linear filter for a weak signal correlated to 
a stronger one. Some specific features of this result in connection with the coupled visual and aural maps are: 

• The shape of the aural receptive field is very much controlled by the visual information, modulated by the costs 
associated with neural connectivities. 

• Artificially displacing the two signal sources, as in the case of the prismatic spectacles used on the barn owls[^, 
modifies the aural receptive field. However, the resulting receptive field is not simply shifted (unless the costs 
of neural wirings are negligible), but also changes its shape. 

• Equation ( pT| ) is the product of two functions, one peaked at the origin and the other at x = c. Depending of 
the relative strengths and widths of these two peaks, the receptive field may be more sensitive to signals at the 
original, or in the shifted location. 

• The experiments find, not surprisingly, that adaptation to the prismatic glasses depends strongly on the age of 
the individual owl. This feature can be incorporated in our model with the reasonable assumption that the cost 
of neural connections increases with age of the individual. 

This work is small step towards providing a quantitative framework for deducing the workings of the brain, starting 
from the tasks that it has to perform for the organism to function in its natural habitat. In this framework, the tasks 
of the sensory systems are more apparent: to extract the relevant signals from the background of natural inputs, 
and as a first step to localize the source of the signal in the external world. It is possible to experimentally gather 
information about the correlations of various signals in the natural world, and there are indeed several studies of the 
statistics of various aspects of visual images |pT[. Of course, such statistics are also specific to the instrument (e.g. 
camera) used to obtain the image. More relevant are psycho-physical studies that probe how individuals parse the 
visual information]!^ . We are not aware of similar studies on the statistics of natural sounds in different directions, 
and their correlations with visual signals. Such studies may provide part of the material needed for a more detailed 
study. 

The outcome of the procedure outlined in this paper is a set of filter functions, which are hopefully related to the 
actual connections between neurons. The shape and range of such connections can be studied directly by injection 
of biocytin dyepsf, and indirectly by mapping the receptive field of a neuron via a microelectrode probe. Detailed 
studies of this kind for the owls reared with prismatic spectacles, and their comparison with Eq. ( pl| ) may provide 
insights about the cost of making neural connections, another necessary input to our general formalism. 

The analytical formalism itself can be extended in several directions. Already, in Ref. 0] it was proposed that 
colored images can be studied by considering a vector signal s ranging over the color wheel. In regards to different 
sensory inputs, we may can also ask if and when it is advantageous to segregate outputs to distinct cortical areas, 
allowing for distinct maps {O^}. A more ambitious goal is to extend the formalism to time dependent signals, allowing 
for filters with appropriate time delays that attempt to take advantage of temporal patterns in the signals. 



Acknowledgments 

This work was supported in part by the NSF under grant numbers DMR-01-18213 (MK), PHY89-04035 and 
PHY95-07065 (AZ). 



References 

[1] J.H. Kaas and T.A. Hackett, J. Comp. Neurol. 421, 143 (2000). 
[2] E.I. Knudsen, J. Neurosci. 2, 1177(1982). 
[3] E.I. Knudsen, Science 222, 939(1983). 
[4] E.I. Knudsen, Nature 417, 322(2002). 

[5] M.S. Brainard and E.I. Knudsen, J. Neurosci. 18, 3929(1998); E.I. Knudsen and M.S. Brainard, Science 253, 85(1991). 
[6] M. Rucci, G. Tononi, and G.M. Edelman, J. Neurosci. 17, 334(1997). 

[7] W. Bialek, D.L. Ruderman, and A. Zee, in Advances in Neural Information Processing Systems, edited by R. P. Lippman, 

et al., (San Mateo, Morgan Kaufmann Publishers, 1991) page 363. 
[8] W. Bialek and A. Zee, Phys. Rev. Lett. 61, 1512 (1988). 
[9] W. Bialek and A. Zee, J. Stat. Phys. 59, 103 (1990). 
[10] C.E. Shannon and W. Weaver, The Mathematical Theory of Computation, University of Illinois Press, Urbana, IL (1949). 



7 



[11] M. Sigman, G.A. Cccchi, CD. Gilbert, and M.O. Magnasco, Proc. Nat. Acad. Sci. 98, 1935(2001). 

[12] J. Malik, D. Martin, C. Fowlkcs, and D. Tal, (A Database of Human Segmented Natural Images and its Application 
to Evaluating Segmentation Algorithms and Measuring Ecological Statistics) submitted to International Conference on 
Computer Vision, 2001. [Also available as Technical Report No. UCB/CSD- 1-1133, Computer Science Division, University 
of California at Berkeley, January, 2001.] 

[13] W.M. DeBello, D.E. Feldman, and E.I. Knudsen, J. Neurosci. 21, 3161(2001). 



